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Several application domains require formal but flexible approaches to the comparison problem. Dif- 
ferent process models that cannot be related by behavioral equivalences should be compared via a 
quantitative notion of similarity, which is usually achieved through approximation of some equiva- 
lence. While in the literature the classical equivalence subject to approximation is bisimulation, in 
this paper we propose a novel approach based on testing equivalence. As a step towards flexibility 
and usability, we study different relaxations taking into account orthogonal aspects of the process 
observations: execution time, event probability, and observed behavior In this unifying framework, 
both interpretation of the measures and decidability of the verification algorithms are discussed. 

1 Introduction 

The need for a comparison between process models is an important requirement in several practical 
domains, ranging from the model-based verification of web service composition [16] to security 161 . 
safety |[T8l . and performability HI verification. For instance, equivalence checking can be helpful to 
compare a web service implementation with some desired qualitative/quantitative service description, 
to relate an implemented software architecture to a reference dependable architectural model, and to 
reveal the performability impact of one component over the whole system through the comparison of 
the two system views that are obtained by activating/deactivating the component (this is generally called 
noninterference analysis). Such a comparison must be based on a precise semantics and some notion 
of process equivalence. In the formal methods community several notions of equivalence have been 
proposed which differ from each other for their observational power - e.g. from the "weakest" trace 
equivalence to bisimulation through the "intermediate" testing equivalence - and for their granularity - 
e.g. from nondeterministic versions of observation equivalences to the corresponding probabilistic, real- 
time, and stochastically timed extensions (see, e.g., f7^ for a survey in the setting of process algebra). 

In real-world applications perfect equivalence is usually hard to achieve when comparing models 
that describe either a system at different abstraction levels or alternative implementations of the same 
ideal system. Hence, adding the quantitative aspect to the comparison is of paramount importance to 
establish how much these models fit according to an expected behavior. This can be done, e.g., in 
a framework where fine-grain models specify probability distributions of events and/or their temporal 
behaviors. Alternatively, functional models can be quantitatively compared with respect to a benchmark 
of testing scenarios. In any case, some kind of mathematical function is employed to estimate the degree 
of similarity between models that do not exhibit the same behavior. 

In this paper, a new approach to the approximation of behavioral equivalences is proposed in a 
process-algebraic setting in which three alternative dimensions - time, probability, and observed behavior 
- characterize what we mean by degree of similarity from the viewpoint of the expressive power of an 
external observer. 
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First, we compare process models on the basis of their temporal behavior. Taking into account 
the passage of time when observing the process execution requires the specification of durations. In 
our setting, system activities are associated directly with their durations, which are modeled through 
exponentially distributed random variables. In particular, the stochastic process governing the system 
evolution over time turns out to be a continuous-time Markov chain (CTMC). Second, by considering that 
timing aspects are dealt with probabilistically, it makes sense to compare process models with respect to 
probability distributions associated with their behaviors. Third, in order to analyze the observed behavior 
by abstracting from additional quantitative information such as time and probability, we introduce an 
approach that allows the distance between process models to be estimated with respect to their functional 
reaction to test-driven executions. 

All the three dimensions ai^e considered in the setting of a unifying semantics. More precisely, we 
employ a Markovian extension of testing equivalence |8 |, whose use represents a novelty in the field of 
approximate analysis. The main reason for this choice is that Markovian testing equivalence provides in 
a natural and explicit way an ideal framework for the definition of degree of similarity with respect to 
time, probability, and observed behavior. To give some intuitive insights, Markovian testing equivalence 
compares processes in terms of probability of observing test-driven computations that somehow "pass" 
tests and satisfy temporal constraints about the amount of time needed to pass these tests. Therefore, by 
relaxing in turn each of these parameters - durations associated with specific computations, probability 
distributions of these computations, and kind of tests elucidating them - we easily obtain different notions 
of approximate testing equivalence under the three considered dimensions. Moreover, as will be shown, 
in this framework it is possible to join the advantages of a decidable theory with the convenience of 
obtaining measures that can be easily interpreted in an activity oriented setting. 

The remainder is organized as follows. First, we introduce some background about the testing frame- 
work (Sect. ID, i.e. we recall the Markovian process calculus and Markovian testing equivalence based 
on which we then formalize a notion of approximate testing equivalence from three different viewpoints 
(Sect. [3]l. The relaxed versions of Markovian testing equivalence based on time, probability, and ob- 
served behavior are presented separately and then combined in a unifying definition. Finally, the paper 
proposes some comparison with related work and interesting sights for future work (Sect. |4]). 

2 Markovian Testing Framework 

In this section, we recall Markovian testing equivalence in the setting of a Markovian process calculus 
that generates all the finite CTMCs with a minimum number of operators. For a complete survey of the 
main results concerning these topics, the interested reader is referred to 

2.1 Markovian Process Calculus 

In the Markovian process calculus that we consider (MPC for short) every action is exponentially timed 
and its duration is described by a rate A G R>o defining the exponential distribution such that the average 
duration of the action is given by the inverse of its rate. Formally, Act = Name x M>o is the set of actions 
of MPC, where Name is the set of action names, ranged over by a,b,..., including the distinguished 
symbol t denoting the invisible action. 

The set of process terms of MPC is generated by the following syntax: 

P::=0 I <a,X>.P\P + P\A 

where: 
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• The inactive process represents a terminated process. 

• The action prefix operator <a,X,>.P represents a process performing the durational action <a, A > 
and then behaving as P. 

• The alternative composition operator encodes choice. If several durational actions can be per- 
formed the race policy is adopted, i.e. the fastest action is the one that is executed. The execution 
probability of each durational action is proportional to its rate and the average sojourn time asso- 
ciated with the related process term is exponentially distributed with rate given by the sum of the 
rates of the actions enabled by the term. 

• A is a process constant defined by the possibly recursive equation A = P. 

We denote with ^ the set of closed and guarded process terms of MPC. The behavior of P £ ^ is 
given by the labeled multitransition system [[P]\, where states correspond to process terms and transitions 
are labeled with actions. In particular, each transition has a multiplicity in order to keep track of the 
number of different proofs for the derivation of the transition. This is necessary because the idempotent 
law does not hold in the stochastic setting. Indeed, a term Uke <a,X,>.P+ <a,X,>.P is not the same as 
<a,X>.P because of the race policy. 

From the labeled multitransition system [[f]] a CTMC can be easily derived by discarding the action 
names from the labels and collapsing all the transitions between any pair of states into a single transition 
whose rate is the sum of the rates of the collapsed transitions. 

Formally, the semantic rules for MPC are as follows: 

{a,X).P >P 

a,X a,X 
Pi >P' P2 >P' 



Pi+Pi >p' Pi+Pi >p' 

A Cl.X 

A = P P yP' 



A VP' 



2.2 Markovian Testing Equivalence 

Markovian testing equivalence is based on notions for process terms of MPC like exit rate - the rate at 
which we leave the state associated with the term - and computation - a sequence of transitions that can 
be executed starting from the state associated with the term. Below, we recall these two notions before 
introducing the testing scenario. 

Definition 2.1 Let P e ae Name, and C C The exit rate at which P executes actions of name a 
that lead to C is defined through the non-negative real function: 

rate{P,a,C) ='L{\Xe M>o | 3P' G C.P-—^P'\} 
where the summation is taken to be zero whenever its multiset is empty. ■ 

If we sum up the rates of all the actions that a process term P can execute, we obtain the total exit 
rate of P. 
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Definition 2.2 Let Pe^. The total exit rate of P is defined as ratet{P) = rate{P, a,^). m 

a^Name 

The length of a computation is the number of transitions occurring in it. We denote with '^f(P) 
the multiset of finite-length computations of P G i^. Two distinct computations are independent of 

each other if neither is a proper prefix of the other one. In the remainder, we concentrate on finite sets 
of independent, finite-length computations. We now define the concrete trace, the probability, and the 
duration of an element of 'io{{P), using _o _ for sequence concatenation and |_| for sequence length. 

Definition 2.3 Let P ^ and c G '^i{P). The concrete trace associated with c is the sequence of action 
names labeling the transitions of c, which is defined by induction on the length of c through the Name*- 
valued function: 

[5 if|c|=0 
trace {c) = < a,X 

[ aotrace{c') if c = P ^c' 

where 5 is the empty trace. ■ 

Definition 2.4 Let P (E and c G '^i{P)- The probability of executing c is the product of the execution 
probabilities of the transitions of c, which is defined by induction on the length of c through the M]o,i]- 
valued function: 

r 1 if|c|=0 
prob(c) = I . a,X 

\ T^yP^ohic') ifc^P >c'. 

The probability of executing a computation in C C ^{{P) - whenever C is finite and all of its computations 
are independent of each other - is defined as: 

prob{C) = prob{c). ■ 

Definition 2.5 Let P G ^ and c G ^f{P)- The stepwise average duration of c is the sequence of average 
sojourn times in the states traversed by c, which is defined by induction on the length of c through the 
(M>o)*-valued function: 

(5 if |c| = 

time{c) = < a,X 

[ iic = P >c' 

where 5 is the empty stepwise average duration. We also define the multiset of computations in C C 
%{P) whose stepwise average duration is not greater than 6 G (M>o)* as: 

C<0 = {\ceC\ \c\ < \e\ A^i = I, . . . ,\c\.time{c)[i\ < 

Moreover, we denote by C' the multiset of computations in C C (P) whose length is equal to / G N. ■ 

The main idea underlying the testing approach is that two process terms are equivalent whenever an 
external observer interacting with them by means of tests cannot infer any distinguishing information 
from the functional and quantitative standpoints. Tests are represented as process terms that interact with 
the terms to be tested through a parallel composition operator enforcing synchronization on all visible 
action names. A test is passed with success whenever a specific point during execution is reached. In the 
rest of the paper, we model tests as non-recursive, finite-state process terms. 

Intuitively, at each state the process term proposes the execution of a durational action chosen ac- 
cording to the race policy and then, if such an action is visible, the test decides either to react by enabling 
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the interaction or to block it (note that tests cannot block the execution of T actions). The interaction can 
occur between actions with the same name only. If the test offers several actions with the same name as 
that of the action chosen by the term, then the selection of one such actions is probabilistic. 

Formally, tests consist of nondurational actions each equipped with a weight w G M>o. The set 
of tests respecting a canonical form is necessary and sufficient to decide whether two process terms 
are Markovian testing equivalent. Each of these canonical tests allows for one computation leading to 
success, whose intermediate states can have alternative computations leading to failure in one step. 

Definition 2.6 The set Tr^c of canonical reactive tests is generated by the syntax: 

r ::= s I <a,*i>.r+ £ <Z7,*i>.f 

where a G S", ^ C Name — {t} finite, the summation is absent whenever S" = {a}, and s (resp. f) is a 
zeroary operator standing for success (resp. failure). ■ 

The following semantic rules define the interaction between a process term and a test: 

P yp' P >P' T >T' 



'^'^ ,,, weight(T,a) 

P\\T >P'\\T p\\T )P' II T' 

where weight{T, a) = w \ 37'. T yj T' |} is the weight of T with respect to a and >-[ denotes 

the transition relation for tests. 

Given P ^ 3^ and T G Tr^c^ the interaction system of P and T is the process term P\T , where each 
state of [[P II r]] is called a configuration. We say that a configuration is successful if its test part is s 
and that a test-driven computation is successful if it traverses a successful configuration. We denote with 
y'io{P,T) the multiset of successful computations of P|| T. It is worth noting that for any sequence 
B € (IR>o)* of average amounts of time the multiset o5^'^<e(P, T) is finite and all the computations of it 
have a finite length and are independent of each other. 

Markovian testing equivalence requires to compare the probabilities of performing successful test- 
driven computations within a given sequence of average amounts of time. 

Definition 2.7 Let P\,P2 G We say that Pi is Markovian testing equivalent to P2, written Pi ~mt Pi, 
iff for all reactive tests T G Tr c and sequences Q G (IR>o)* of average amounts of time: 

prob{S^^\l\{P,J)) = prob{y^%{P2j)). M 

The following example justifies why the average duration of a computation has been defined in terms 
of the sequence of average sojourn times in the states traversed by the computation, rather than simply 
considering the sum of average durations. 

Example 2.8 Consider the two process terms: 

<g,Y><a,X>.<b,iJ.>.0 + <g,'Y>.<a,iJ.>.<d,X>.0 
<g,y>.<a,X>.<d,iJ.>.0 + <g,Y>.<a,iJ.>.<b,X>.0 
Under the assumption A 7^ /X and b^d, both terms have a computation with concrete trace goaob, prob- 
ability 2, average duration ^ + j + ji, but different average sojourn times. We can argue similarly for 
the computation with concrete trace goaoj. Intuitively, an external observer distinguishes between them 
by observing the names of the actions that are performed and the instants at which they are performed. 
This is captured by ^mt as the two process terms are not Markovian testing equivalent. ■ 
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3 Approximate Markovian Testing Equivalence 

In this section, we show three levels of approximation for ~mt- The goal is to estimate from different 
perspectives how much a process term P2 is similar to a given process term Pi . Here we assume that 
Pi represents the original model to be approximated through an alternative model P2. Since similarity 
cannot be transitive, as usual when relaxing equivalence relations we will also investigate what can be 
"transitively" inferred about the distance between two process terms Pi and P3 whenever there exists a 
process term P2 that is similar to both of them. 

The three considered dimensions of the similarity problem are: time taken to pass a test (Sect. 13.11 ). 
expressed as the sequence of average sojourn times in the states traversed by successful computations; 
probability with which tests are passed (Sect. 13.21 ): syntactical form of the passed test (Sect. 13.31 ). For 
every dimension, we will provide a measure of the distance between process terms that do not satisfy 
~MT> by stepwise refining the notion of similarity in terms of flexibility and usability. In each case, 
we will discuss the interpretation of the measure and the complexity of the algorithm measuring the 
distance between process terms. Finally, we will present a unifying notion of approximate Markovian 
testing equivalence - resulting in Def . 13.181 - which joins all the ingredients mentioned before. Indeed, a 
unifying framework is useful to study the trade-off existing among the three orthogonal aspects and the 
related impact upon the inequalities of the process terms under comparison. 

3.1 Approximating Time 

The first dimension under consideration is time. In the setting of ~mt, the time needed to pass a test 
with success is described as the sequence of average sojourn times in the states traversed by successful 
computations. Approximation at this level consists in relaxing the condition concerning the average 
sojourn times. We will introduce such an approximation through several steps in an incremental way. 
First, we will show how a process term Pi can be approximated by a process term P2 that is either 
"slightly slower" or "slightly faster" than Pi. Then, we join both interpretations of similarity in order to 
obtain the most general definition of Markovian testing similarity with respect to time. 

We start by introducing the idea of slow approximation. Whenever P2 approximates successful com- 
putations of Pi with respect to a test T and temporal threshold 6 G (M>())*, stepwise average sojourn 
times slightly greater than those imposed by 6 may be tolerated. In this case, we obtain a slow approxi- 
mation, in the sense that P2 simulates Pi - the same tests are passed with the same probabilities - but the 
successful computations of P2 can be slower than the corresponding ones of Pi . 

As a first attempt in formalizing this intuition, we define the multiset of computations in C C "^f (P) 
whose stepwise average duration is not greater than 6 G (M>o)* plus e G M>o, which acts as a tolerance 
threshold: 

C<e+e = {|c gC I |c| < |0| A V/ = \,...,\c\.time{c)[i] < d[i] + e\}. 
Based on this definition, we have the following relaxation of ~mt- 

Definition 3.1 Let Pi ,P2 G =^ and e G M>o- We say that P2 is slow Markovian testing e-similar to Pi iff 
for all reactive tests T G Tr c and sequences d G (M>o)* of average amounts of time: 

pwb{yV^ll{Pi,T)) = prob{y'^%^^{P2,T)). m 

Example 3.2 Consider the process terms Pi = <g,Y> ■<a,Y> Pi = <S^Y~ 8>.<a,Y — 5>.0. 
Then, P2 approximates (is slow Markovian testing £-similar to) Pi, where e = — j expresses exactly 
the difference between the stepwise average amounts of time of the computations of Pi and P2. ■ 
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Note that Pi ~mt Pi if and only if P2 (resp. Pi) is slow Markovian testing 0-similar to Pi (resp. P2). 
Moreover, we have the following transitivity result. 

Proposition 3.3 Let A ,^2,^3 £ <^ and £1 , £2 G IK>o- If P2 is slow Markovian testing £1 -similar to Pi and 
P3 is slow Markovian testing £2-similar to P2, then P3 is slow Markovian testing (£i+£2)-similar to Pi. ■ 

In favor of this approximation of ~mt> we observe that it can be decided through a trivial variant of the 
algorithm for ~mt - which will be outlined later in this section - and with the same time complexity, 
which is 0{rv'), where n is the total number of states of [[Pi]] and [[P2]] f2\. However, an approximation 
such as this is too restrictive, as illustrated in the following example. 

Example 3.4 Consider the process terms of the previous example. Then, P2 is not slow Markovian 
testing £'-similar to Pi, with e' > j:pg^ - ^. In fact, take d[\] such that d[l] < j A < d[l] +e'. 
With this temporal threshold, any computation of Pi is discarded, while this is not the case for P2. ■ 

In order to further relax ~mt> we need to compare explicitly the sets of computations of Pi and P2. 
Formally, given C,C' C '^f(P), we now define the multiset of computations in C whose stepwise average 
duration is not greater than 6 € (M>o)* or else is £-similar, with £ G M>o, to the stepwise average duration 
of any computation in C<g. Therefore: 

C<e+e,c' = C<eU 

{\c eC \c ^C<0A3c' eC'^Q.\c\ < |c'| AV/= l,...,\c\.time{c')\i] <time{c)[i] < time{c')\i] + £\}. 
Based on this definition, we propose a new approximation of '^mt- 

Definition 3.5 Let Pi,P2 G ^ and £ G M>o. We say that P2 is slow Markovian testing £-similar to Pi iff 
for all reactive tests T G Tr c and sequences d G (M>o)* of average amounts of time: 

pwb{y^%{Pl,T)) = /^'-oK^<'l+,,^.^|e|(p,,r)(^2,r)). . 

Intuitively, ^'ro'[^g{Pi , T) is compared with ^^[^g (P2, P) augmented with the successful P-driven com- 
putations of P2 that are slower (up to e) than corresponding computations in yio ^q{Pi^T). 

Example 3.6 Consider two process terms Pi and P2 that are defined as follows, respectively: 

<g,y> .<a,X> .<b,X> .0 + <g,y> .<a,X> .<d ,X> .0 
<g,y> .<a,X> .<d,X - 5> .0 + <g,y> .<a,X - 5> .<b,X> .0 
The computation ci with concrete trace goaob of Pi is slowly £ -simulated by the corresponding com- 
putation C2 of P2, provided that £ > — j. Given any test T G Tr c, for each d G (M>o)* we have 

161 161 

that ci G =5^'^<g(Pi,r) iff C2 G ^"^^g^g y<i!fi''i(p y^(P2,T), because from the temporal standpoint C2 is 
stepwise slower than ci and their difference is limited by e. We can argue similarly in the case of the two 
computations with concrete trace goaod. Hence, P2 is slow Markovian testing £-similar to Pi. ■ 

Note that Pi ~mt P2 if and only if P2 (resp. Pi) is slow Markovian testing 0-similar to Pi (resp. P2). 
Moreover, we have the following transitivity result. 



Proposition 3.7 Let Pi , P2 , P3 G ^ and £1 , £2 G M>o ■ If P2 is slow Markovian testing £1 -similar to Pi and 
P3 is slow Markovian testing £2-similar to P2, then P3 is slow Markovian testing 5-similar to Pi for some 
5<£i+£2- ■ 
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Alternatively, by a symmetric argument we obtain a fast approximation whenever the successful 
computations of P\ are approximated by successful computations of P2 with stepwise average duration 
that can be slightly lower than that of corresponding successful computations of Pi. Based on this 
intuition, we have the following approximation of ~mt still preserving the same results concerning 
Def.im 

Definition 3.8 Let Pi,P2€ ^ and e € M>o- We say that P2 is fast Markovian testing e-similar to Pi iff 
for all reactive tests T G Tr c and sequences 6 G (M>o)* of average amounts of time: 

Example 3.9 Consider a variant of the previous example where the second process term is: 

<g,Y>-<<^^^>-<dA + S>.0 + <g,Y>.<a,X + d>.<b,X>.0 
In this case, it is easy to see that P2 is fast Markovian testing £-similar to Pi, where £ > j — j^- ■ 

The definitions of (slow and fast) Markovian testing similarity can be decided in polynomial time 
by exploiting a simple variant of the same algorithm for -^mt, because essentially the main objective 
- i.e. equating the execution probability of certain successful computations - does not change. The 
unique relaxation concerns the average durations of these computations, i.e. the criterion according to 
which the successful computations to compare are chosen. We now outline the most important steps 
of this proof by illustrating the differences with respect to the original algorithm for ~]vit of |2|. First, 
deciding ~]vit is reduced to decide the Markovian version of ready equivalence, which can be reduced to 
decide probabilistic ready equivalence if we consider the embedded discrete-time versions of the CTMCs 
underlying the two process terms to compare. Then, probabilistic ready equivalence is decided through 
a suitable reworking of the algorithm for probabilistic language equivalence 1 19|. In the transformation 
from continuous time to discrete time, information about the total exit rate of each state is encoded within 
the action names labeling the transitions leaving that state. Note that the use of this additional information 
provides the unique difference between ~mt and (slow and fast) Markovian testing similarity. More 
precisely, when applying the algorithm for probabilistic language equivalence in the case of -^mt. a state 
of [[Pi]] is equated to a state of [[P2]], i.e. they are put into the same accepting set, if and only if the two sets 
of augmented action names labeling the transitions departing from the two states coincide. In particular, 
they must exhibit the same total exit rates. Hence, the temporal information represents a decoration 
that is used to decide which states of [[Pi]] and [[P2]] belong to the same accepting set. In our relaxed 
setting, instead of checking the equality between the total exit rates as required by ~mt. we check their 
inequality up to e, i.e. a state of [[Pi]] is equated to a state of [[P2]] if the total exit rate of the second state 
is greater/lower than the total exit rate of the first state and their difference is limited by the threshold e. 
Then, once the accepting sets are defined according to this condition, the algorithm of fT9l proceeds as 
usual. The time complexity of the overall algorithm is 0{rr'). 

Markovian testing similarity can be further relaxed. On the one hand, the fast and slow versions can 
be combined together, thus obtaining the following definition. 

Definition 3.10 Let P\,P2& -"^ and e € M>o- We say that P2 is temporally Markovian testing £-similar 
to Pi iff for all reactive tests T G Tr c and sequences d G (M>o)* of average amounts of time: 

Hence, a computation c of Pi can be approximated either by a slower or by a faster computation of P2. 
However, c cannot be approximated by a computation of P2 that is stepwise either slower or faster than 
c. In order to overcome this limitation, we introduce the following relaxation of C<e+e c'^ 

C<0±e.c' = C<eU 

{|c G C I c 0C<e A3c' G C<g. |c| < jc'j AV/= \ ,. . . ,\c\.time{c')[i\- e < time{c)[i\ < time{c')[i\ + e\}. 



A. Aldini 



9 



Based on this notion of approximation, a computation c is similar to a computation c' if the difference 
between their average sojourn times is limited by e. Then, we have the following variant of Def. 13.101 

Definition 3.11 Let Pi,P2 G and e G M>o- We say that P2 is temporally Markovian testing £-similar 
to Pi iff for all reactive tests T £ Tr c and sequences 6 G (M>o)* of average amounts of time: 

prob(yrf^^^ ,,,, (Pi,T)) = probiyff^^^ APiJ))- m 

Note that this extension does not alter the decidability results of Markovian testing similarity. 

Example 3.12 Consider a variant of the previous example where the second process term is: 

<g,7>.<a,A -5>.<<i,A + 5>.0 + <^,7>.<a,A + 5>.<Zj,A -5>.0 
It can be verified that P2 is temporally Markovian testing e-similar to Pi, where £ > -^14^ — j-. ■ 

On the other hand, when comparing the computations of two process terms we can decide to change 
at each step the value of the threshold expressing the tolerance to different temporal behaviors. This is 
obtained by assuming £ G (IK>o)* and checking, e.g., the inequality: 

V/ = 1, . . . , [c|.?jme(c')[/] < ?/me(c)[/] < ?/me(c')[/] + £[/] 

within the definition of C<e+e c'- For instance, this variant can be used to discount the effect of far (in 
the future) steps by assuming that £ [/] increases as long as / increases. 

3.2 Approximating Probability 

The introduction of a relaxation concerning the probabilistic behavior of process terms results into the 
following extension of ~mt where the probabilities of the successful T-driven computations of P\ and 
Pj are not imposed to be equal anymore. 

Definition 3.13 Let P\,P2 G ^ and £ G M>o. We say that P2 is probabilistically Markovian testing £- 
similar to Pi iff for all reactive tests P G Tr c and sequences Q G (IR>o)* of average amounts of time: 

KoZ7(^'rJI (Pi,P)) -proh{^^'^%{P2j))\ < £. B 

As we have seen in the previous section, verifying Markovian testing equivalence amounts to decide 
whether two probabilistic automata accept the same words with the same probability. However, as shown 
in [10], the relaxation of this equivalence problem, i.e. checking whether for all words the distance 
between two process models is less than £, is an undecidable problem. 

To make it decidable, it is possible to restrict ourselves to more specific notions of probabilistic 
similarity. As an example, fTvl defines a polynomially accurate similarity that can be rephrased in our 
testing framework as follows: any set of successful computations of Pi with a polynomial number of 
steps must be matched by P2 with an error that is bounded by any polynomial. In order to measure the 
distance between process terms even when their difference is not negUgible in the sense of [ 17], in the 
next section we will show that decidability is obtained by relaxing the condition over tests in Def. 13.131 

3.3 Approximating Tests 

Similarly as done in Sect. 13.11 in this section we consider in an incremental way a notion of similarity 
that is based on the exemplary behavior of tests. The proposed approach is not completely naive as it is 
somehow inspired by ||5l, where processes are compared with respect to an event log describing typical 
behaviors. In particular, in Q processes are defined in terms of Petri nets and an event log is a multiset 
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of firing sequences. Then, different models are compared by measuring the overlap in (partially) fitting 
these sequences. This is done by using a fitness function that takes into account all enabled transitions 
at any point in each sequence. This idea results into two measures, called precision and recall. Precision 
establishes whether the behavior of the second, alternative model is possible from the viewpoint of the 
behavior of the first, original model. Recall establishes how much of the behavior of the first model is 
covered by the second model. In our setting, we resort to a variant of this kind of approach from two 
different perspectives. 

First, we observe that the notion of typical behavior that is at the base of model evaluation is naturally 
represented by tests. While in Q it is suggested to define the event log through simulation or by explicitly 
describing by hand some typical behavior of interest, in our setting we formally describe an event log as 
a finite set of tests satisfying properties described in terms of logical formulas. Canonical tests do not 
exhibit any probabilistic and temporal behavior, so that we can employ the logical characterization of 
testing equivalence, which comprises a restricted set of logical operators: a modal operator on sequences 
of visible actions, true, disjunction, and diamond 121. Then, given a formula (j) representing a property 
of interest, we use as event log the set of canonical tests satisfying (j), called Tr c.(|), provided that such a 
set is finite. As an example, <p could be the formula that is satisfied by all the tests in which the unique 
computation leading to success is made of the concrete trace ai o . . . oa„. Thus, this trace represents the 
property with respect to which it is interesting to compare two process terms. In general, tests satisfying 
(p denote the set of typical behaviors, parameterized by 0, which guide the estimation of the degree of 
similarity between process terms. 

Second, we observe that a test-based notion of the fitness measures of ||5l can be used to estimate the 
similarity between tests. Approximating tests, as well as relaxing time and probability requirements, is 
justified by the fact that we intend to overcome the typical limitations of "perfect" equivalence. In order 
to relax ~]vit by following this intuition, we assume that the process terms to compare are not expected 
to exhibit the same quantitative behavior when interacting with the same test, but they can exhibit such 
a behavior when interacting with two possibly different but similar tests. In other words, if a process 
term satisfies a test with a certain probability and within a given amount of time, then the second one 
can simulate the behavior of the first term by satisfying with the same probability and by the same time 
another test that fits the first test according to a notion of test similarity. 

Inspired by the formulas of lU, we now define the notions of behavioral precision and recall for 
test similarity. Let trace s{T) be the concrete trace associated with the unique computation of T lead- 
ing to success, [r| be the length of this trace, and 7] be the i-th state of it, such that T\ ::= T and T\y\ 
is the state that reaches success in one step. Then, we assume that V/ = 1, . . . , |r|, enabled {T,i,s) = a 
iff traces{T)\i] = a and enabled {T,i,f) = {b \ Ti ::= <b,*i>.f+T'}. In practice, enabled {T,i,s) de- 
notes the transition belonging to the successful computation of T that is enabled at the i-th step, while 
enabled{T,i,f) denotes the set of transitions leading to failure in one step that are enabled at the i-th 
step. Then, we introduce the following definitions of precision and recall for two tests T and T': 



prec{T, T' 




\T'\ \(enabled{T,i,s)r\enabled{T' ,i,s)) U {enabled(T ,iJ)nenabled(T' ,i,f))\ 
<i=l \enabled{T' ,i,f)\ + \enabled(J' ,i,s)\ 



and: 



rec{T, T 



) 



1 ^\T\ \{enabled(T,i,s)nenabled{T' ,i,s)) U {enabled{T ,i,f)r\enahled{T' ,i,f)) 
^'=1 \enabled(T,i,f)\ + \enabled(T,i,s)\ 



At each step, we compare the set of enabled transitions for the cunent state of the two tests, by 
distinguishing the transitions leading to failure from the unique one along the computation leading to 
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Table 1: Transitivity relations for prec and rec: z,w,x,y E [0, 1[ 



success. Both formulas establish a measure between and 1 that estimates the similarity between them. 
Obviously, it holds that prec{T,T') = rec{T\T). Similarly as in ||5l, it is important to note that tests 
are not imposed to offer the same behavior, which may differ step by step thus originating different 
computations. 

Analogously, T and T' are not imposed to have the same length. For instance, if \T\ = 2 ■ \ T' \ = 2 ■ n 
and the behaviors of T and T' coincide in the first n steps, then prec{T, T') = 1 because each behavior of 
T' is possible according to the behavior of T, while rec{T, T') = ^ because only half of the behavior of T 
is covered by the behavior of T'. On the other hand, T and T' coincide iff prec{T, T') = rec{T, T') = 1. 

Example 3.14 Consider Ti = <a,*i>.s + <Z7, *i>.f and T2 = <Z7, *i>.s + <<3, *i >.f. Then, it holds 
that prec{Ti,T2) = rec{T\ ,T2) = because we distinguish actions leading to success from those leading 
to failure. Without this distinction, it would result prec{T\,T2) = rec{T\,T2) = 1. 

Now, consider the two tests T\ = <a\ , *i >.<fl'2, *i>.s + <b,*\>.i and T2 = <c, *i >.<a2, *i>.s + 
<ft,*i>.f + <Z>',*i>.f. Then, prec{Ti,T2) = | and rec{T\,T2) = |. Recall is higher than precision, be- 
cause the unique behavior of T\ that is not covered by T2 is the first action of the successful computation, 
while from the viewpoint of Ti we have two impossible behaviors of T2, i.e. the actions c and b' . ■ 

Precision and recall satisfy the same transitivity relations shown in 1 5], as reported in Table [T] for the 
sake of completeness. 

Then, by using a notion of test similarity quantified with respect to the precision and recall defined 
above, we have the following relaxation of ~mt> which is based on the observed behavior expressed in 
terms of test-driven computations, where instead of a single test we consider a pair of tests that fit almost 
the same. The first attempt abstracts from the temporal behavior of the process terms to compare. 

Definition 3.15 Let P\^P2^ ^ and Tr c,^ a finite set of tests. We say that P2 is behaviorally Markovian 
testing similar to P\ with precision p G [0, 1] and recall r E [0, 1] iff for each reactive test T E Tr c,0 there 
exists a reactive test T' E Tr c,0 such that: 

1. prec{T,T') > p and rec{T,T') > r 

2. prob{y^{PuT)) = prob{y^{P2,T')). U 
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As far as the transitivity properties of Def. |3.15] are concerned, we now discuss what can be inferred 
about two process terms Pi and P3 provided that there exists a process term P2 such that Pj is behaviorally 
Markovian testing similar to Pi with precision p and recall r and P3 is behaviorally Markovian testing 
similar to P2 with precision p' and recall r'. By hypothesis, for each test T applied to Pi there exists a 
test T' applied to P2 such that the probabilities of the successful P-driven computations of Pi and of the 
successful r'-driven computations of P2 are equal. By hypothesis, there exists also a test T" applied to P3 
such that the probabilities of the successful P'-driven computations of P2 and of the successful P"-driven 
computations of P3 are equal. Hence, the probabilities of the successful P-driven computations of Pi and 
of the successful P"-driven computations of P3 are equal. Afterwards, prec{T, T") and rec{T, T") can be 
inferred from p, r, p' , and r', as shown in Table[T] 

In order to take into different account behaviors with a very low probability of success in comparison 
with successful behaviors occurring more frequent, in the two inequalities of Def. I3.15] we can multiply 
p and r by the probability of the successful test-driven computations of Pi . 

The next step refines the condition about probabilities of Def. I3.15l by taking into account the tempo- 
ral behavior of process terms. We recall that ~mt is defined with respect to all the sequences Q G (M>o)* 
of average amounts of time. When considering a canonical test T and a process term P that does not 
execute invisible actions, we can restrict ourselves to the sequences of length \T\, which is the exact 
number of steps needed to reach success. This is not enough to reduce the comparison between P and a 
similar test T' to a finite set of sequences. Therefore, we now define a canonical set of sequences for P 
that is finite and is sufficient to decide whether a process term behaviorally simulates another one with 
respect to P. 

Such a canonical set is made of a sequence for each subset of the set of successful computations 
(P, P). For each X G 2^''^''^'(^'^^ we define the sequence of average amounts of time dx such that 
V/ = 1, . . . , |P|. 0xW = m?iXcex{time{c)[{\} and the canonical set 0(P, T) = {dx \ 2-^"^'^'('P'^'}. Note 
that X C ^<rfi^(P,P) and that we may have =r^^i^(P,P) = ^'rJi^(P,P) for some X / 7, so that 

the minimum number of sequences to consider could be lower than 12-^'^'^' (^'^^ |. 

The algorithm that computes these sequences consists of building a tree as follows. The root is at 
level 1 and is marked with the set of all the successful computations ^^'^1 (P, P). If the current node of 
the level / is marked with a set y of computations, then create a child node for each F C ^ for which 
there exists k € M>o such that times{c)[i\ < k for each c € F. Add to this new node the labels F and 
maxcgy {f/me(c) [/] }. The tree construction terminates at the level [P| + 1. In this way, the tree contains at 
most 12'^'^''^' (^'^) I leafs, each leaf is associated with a subset X G 2'^''^'^'(^'^), and the path from the root 
to this leaf contains as labels the average amounts of time forming the sequence dx- 

Proposition 3.16 LetPi,P2 G ^ and P G Tr^c- If for each sequence 6 G 0(Pi,P)U0(P2,P) of average 
amounts of time we have: 

prob{y^%{PuT)) = prob{y^%{P2,T)) 
then, we also have that for each sequence 6 G (M>o)* of average amounts of time: 

prob{y^\l\{Pi,T)) = prob{y^\l[{P2j)). . 
Now, we are ready to define a decidable approximation of ~mt based on observed behavior. 

Definition 3.17 Let Pi ,P2 G ^ and Tr c,^ a finite set of tests. We say that P2 is behaviorally Markovian 
testing similar to Pi with precision p G [0, 1] and recall r G [0, 1] iff for each reactive test P G Tr c,0 there 
exists a reactive test T' G Tr_c,0 such that for all sequences d G 0(Pi , P) U 0(P2, T') of average amounts 
of time: 
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1. prec{T, T') > p and rec{T, T') > r 

2. probiy^^^lliPuT)) = pwb{y'^^ll{P2,T')). m 

The same considerations concerning the transitivity of Def . 13.151 still hold. With respect to the ap- 
proximations based on time and probability that have been discussed in the previous sections, in this 
setting we deal with finite sets of tests and sequences of average amounts of time. Hence, it is possible 
to define a very intuitive, still decidable, approximation of ^■mt based on time, probability, observed 
behavior, and the three corresponding families of quantitative thresholds. 

Definition 3.18 Let Pi,P2 G ^ and Trc,(^ a finite set of tests. We say that P2 is Markovian testing 
similar to Pi with precision p G [0,1], recall r G [0,1], temporal threshold e G M>(), and probability 
threshold v G M>o iff for each reactive test T G Tr c,0 there exists a reactive test T' G Tr such that for 
all sequences 6 G 0(Pi , T) U 0(P2; T') of average amounts of time: 

1. prec{T,T') > p and rec{T,T') > r 

2. \probL9"^^^^ (PuT))-prob(y'^^^^ (P2j'))\<v. H 

Given a modal logic formula ^, we observe that P2 (resp. Pi) is Markovian testing similar to Pi (resp. P2) 
with precision 1 , recall 1 , temporal and probability thresholds if and only if Pi ~mt Pi with respect to 
the tests defined by . It is worth noting that a unifying framework merging the three orthogonal aspects 
(time, probability, and observed behavior) puts the basis for the analysis of the trade-off among them. 

Example 3.19 Consider two process terms Pi and P2 that are defined as follows, respectively: 
<g,Y>-<a,X + 5>.<b,X>.0 + <g,Y>.<a,X>.<d,X>.0 
<g, Y>-<a, ?i>. <d', ?i>.0 + <g, Y>-<a, X>. <b, X -5>.0 

and compare them with respect to tests whose successful computation is described by the concrete trace 

goao*, with * any action. Then, P2 is Markovian testing similar to Pi with: 

• both precision and recall equal to |, where the difference in the observed behaviors is due to the 
two concrete traces goaod of Pi and goaod' of P2, under the assumption d ^d'; 

• temporal threshold e > ~ J ^ J ~ JTB' where the difference in the average sojourn times is 
due to the three rates X, X + 5, X — 5 labeling corresponding transitions related to the two concrete 
traces goaob of Pi and P2; 

• probability threshold 0, since the probabilities of the successful computations to compare are al- 
ways the same. ■ 

4 Related and Future Work 

In the last decade several approaches to the approximation of behavioral equivalences have been proposed 
(see, e.g., lEl HU |20l [El 11 El [HI pl| and the references therein). 

Some of them use a well-estabhshed approach based on behavioral pseudometrics ifTTl |20l . which 
give a measure of the similarity between states of a transition system. These pseudometrics provide a 
conservative extension of bisimulation equivalence. Hence, they cannot be compared with the notions 
of testing similarity, which instead rely on testing semantics. With approaches based on pseudometrics 
it is not easy to establish a clear relation between the measure estimating process similarity and its 
interpretation in a practical, mainly activity oriented, setting. As an example elucidating this aspect, S 
shows the importance of evaluating the impact that the absence of an equivalence relating two process 
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terms has upon their difference with respect to performability measures. However, this is done without 
defining explicitly an approximate equivalence relating these measures with the degree of similarity. 

Some other approaches that are not based on pseudometrics, like pi fT2l 14], rely on relations ap- 
proximating bisimulation equivalence. These approaches seem promising thanks to the strict relation 
between bisimulation and lumping for Markov chains @. Indeed, the characterization of lumpability is 
extremely useful, because the knowledge of a lumpable partition of the states of a Markov chain allows 
the generation of an aggregated Markov chain that is smaller than the original one, but leads to several 
results for the original Markov chain without an error. In this setting, there exist approximation tech- 
niques based on relaxed notions of lumping and on perturbation theory which establish bounds on the 
error made when approximating. This is particularly useful because these bounds are in direct relation 
with the numerical analysis of Markov chains and, therefore, provide immediately a clear interpretation 
of their impact upon the quantitative behavior of the process terms under analysis. However, it seems that 
there still exists a significant gap between the applicability of the approximate bisimulations mentioned 
above and their decidability. Very often, the (strict) assumptions underlying approximate bisimulation 
that are needed to define efficient verification algoritfims are such that it becomes hard to find real appli- 
cation domains and, in particular, to give a natural interpretation of the degree of similarity. On the other 
hand, the definition of an approximate bisimulation that can be related to approximate lumping and has 
an efficient verification algoritfim is still an open problem. 

Contrariwise, the approach proposed in [5] does not rely on behavioral equivalences, since it is based 
on the estimation of observed behaviors - quantified through a notion of fitness that does not require 
any nonfunctional information such as time and probability - whenever log-driven computations are 
compared. However, this estimation is not related to any notion of behavioral equivalence. 

The main result of this paper is showing that testing equivalence offers an ideal semantic framework 
for joining ideas taken from approximate behavioral equivalences with the approach of [5 |. In addition, 
the proposed definitions of approximation elucidate the role of each aspect under consideration - time, 
probability, and observed behavior - without sacrificing neither decidability nor usability. 

As future work, it would be interesting to investigate the relation between the estimations provided by 
approximate Markovian testing equivalence and T-lumpability [2J, which is the version of lumpability 
corresponding to Markovian testing equivalence. One such result would enhance the applicability to 
domains where the degree of similarity must be interpreted in terms of impact upon the performance 
behavior of systems. 

The application to real examples will be the subject of further investigations. For instance, it is 
well-known that approximate equivalence checking can be profitably employed in the setting of nonin- 
terference analysis. Basically, one user/component may affect the behavior of other users/components 
in a way that compromises system properties like security and safety. Such an impact is studied by 
comparing the two views of the system that are obtained by activating and deactivating, respectively, the 
behavior of the interfering user/component. This approach is illustrated and used in [2] for the evaluation 
of performability aspects of several real-world case studies, like a secure routing system and a power- 
manageable system. In this setting, the goal is to use Markovian testing similarity to compare different 
system views with respect to families of properties formalized through modal logic formulas. The com- 
parison is intended to distinguish which observable behaviors make these views different from functional, 
temporal, and probabilistic perspectives, each case accompanied by a measure of such a difference. 
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